Digital Preservation
   HOME

TheInfoList



OR:

In
library A library is a collection of materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a vir ...
and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods and technologies,Day, Michael. "The long-term preservation of Web content". Web archiving (Berlin: Springer, 2006), pp. 177-199. . and it combines policies, strategies and actions to ensure access to reformatted and "
born-digital The term born-digital refers to materials that originate in a digital form.NDIIPP"Preserving Digital Culture,"Library of Congress. This is in contrast to digital reformatting, through which analog materials become digital, as in the case of fil ...
" content, regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time.Evans, Mark; Carter, Laura. (December 2008). The Challenges of Digital Preservation. Presentation at the Library of Parliament, Ottawa. The Association for Library Collections and Technical Services Preservation and Reformatting Section of the
American Library Association The American Library Association (ALA) is a nonprofit organization based in the United States that promotes libraries and library education internationally. It is the oldest and largest library association in the world, with 49,727 members ...
, defined digital preservation as combination of "policies, strategies and actions that ensure access to digital content over time." According to the ''Harrod's Librarian Glossary'', digital preservation is the method of keeping digital material alive so that they remain usable as technological advances render original hardware and software specification obsolete. The need for digital preservation mainly arises because of the relatively short lifespan of digital media. Widely used
hard drive A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magne ...
s can become unusable in a few years due to a variety of reasons such as damaged spindle motors, and flash memory (found on SSDs, phones,
USB flash drive Universal Serial Bus (USB) is an industry standard that establishes specifications for cables, connectors and protocols for connection, communication and power supply ( interfacing) between computers, peripherals and other computers. A bro ...
s, and in memory cards such as SD, microSD, and CompactFlash cards) can start to lose data around a year after its last use, depending on its storage temperature and how much data has been written to it during its lifetime. Currently,
5D optical data storage 5D optical data storage (also branded as Superman memory crystal, a reference to the Kryptonian memory crystals from the Superman franchise) is an experimental nanostructured glass for permanently recording digital data using a femtosecond las ...
has the potential to store digital data for thousands of years. Archival disc-based media is available, but it is only designed to last for 50 years and it is a proprietary format, sold by just two Japanese companies, Sony and Panasonic.
M-DISC M-DISC (Millennial Disc) is a write-once optical disc technology introduced in 2009 by Millenniata, Inc.and available as DVD and Blu-ray discs. Overview M-DISC's design is intended to provide archival media longevity.M-Disc claims that prop ...
is a DVD-based format that claims to retain data for 1,000 years, but writing to it requires special optical disc drives and reading the data it contains requires increasingly uncommon optical disc drives, in addition the company behind the format went bankrupt. Data stored on LTO tapes require periodic migration, as older tapes cannot be read by newer LTO tape drives.
RAID Raid, RAID or Raids may refer to: Attack * Raid (military), a sudden attack behind the enemy's lines without the intention of holding ground * Corporate raid, a type of hostile takeover in business * Panty raid, a prankish raid by male college ...
arrays could be used to protect against failure of single hard drives, although care needs to be taken to not mix the drives of one array with those of another.


Fundamentals


Appraisal

Archival appraisal In archival science and archive administration, appraisal is a process usually conducted by members of the record-holding institution (often professional archivists) in which a body of records is examined to determine its value for that institutio ...
(or, alternatively, selection) refers to the process of identifying records and other materials to be preserved by determining their permanent value. Several factors are usually considered when making this decision. It is a difficult and critical process because the remaining selected records will shape researchers' understanding of that body of records, or
fonds In archival science, a fonds is a group of documents that share the same origin and that have occurred naturally as an outgrowth of the daily workings of an agency, individual, or organization. An example of a fonds could be the writings of a poe ...
. Appraisal is identified as A4.2 within the Chain of Preservation (COP) model created by the InterPARES 2 project. Archival appraisal is not the same as monetary appraisal, which determines
fair market value The fair market value of property is the price at which it would change hands between a willing and informed buyer and seller. The term is used throughout the Internal Revenue Code, as well as in bankruptcy laws, in many state laws, and by sever ...
. Archival appraisal may be performed once or at the various stages of acquisition and
processing Processing is a free graphical library and integrated development environment (IDE) built for the electronic arts, new media art, and visual design communities with the purpose of teaching non-programmers the fundamentals of computer programming ...
. Macro appraisal, a functional analysis of records at a high level, may be performed even before the records have been acquired to determine which records to acquire. More detailed, iterative appraisal may be performed while the records are being processed. Appraisal is performed on all archival materials, not just digital. It has been proposed that, in the digital context, it might be desirable to retain more records than have traditionally been retained after appraisal of analog records, primarily due to a combination of the declining cost of storage and the availability of sophisticated discovery tools which will allow researchers to find value in records of low information density. In the analog context, these records may have been discarded or only a representative sample kept. However, the selection, appraisal, and prioritization of materials must be carefully considered in relation to the ability of an organization to responsibly manage the totality of these materials. Often libraries, and to a lesser extent, archives, are offered the same materials in several different digital or analog formats. They prefer to select the format that they feel has the greatest potential for long-term preservation of the content. The
Library of Congress The Library of Congress (LOC) is the research library that officially serves the United States Congress and is the ''de facto'' national library of the United States. It is the oldest federal cultural institution in the country. The library ...
has created a set of recommended formats for long-term preservation. They would be used, for example, if the Library was offered items for copyright deposit directly from a publisher.


Identification (identifiers and descriptive metadata)

In digital preservation and collection management, discovery and identification of objects is aided by the use of assigned identifiers and accurate descriptive metadata. An identifier is a unique label that is used to reference an object or record, usually manifested as a number or string of numbers and letters. As a crucial element of metadata to be included in a database record or inventory, it is used in tandem with other descriptive metadata to differentiate objects and their various instantiations.Greenberg, Jane.
Understanding Metadata and Metadata Schemes
." Cataloging & Classification Quarterly 40.3-4 (2005): 17-36. National Information Standards Organization.
Descriptive metadata refers to information about an object's content such as title, creator, subject, date etc... Determination of the elements used to describe an object are facilitated by the use of a metadata schema. Extensive descriptive metadata about a digital object helps to minimize the risks of a digital object becoming inaccessible. Another common type of file identification is the filename. Implementing a file naming protocol is essential to maintaining consistency and efficient discovery and retrieval of objects in a collection, and is especially applicable during digitization of analog media. Using a file naming convention, such as the
8.3 filename An 8.3 filename (also called a short filename or SFN) is a filename convention used by old versions of DOS and versions of Microsoft Windows prior to Windows 95 and Windows NT 3.5. It is also used in modern Microsoft operating systems as an alterna ...
or the Warez standard naming, will ensure compatibility with other systems and facilitate migration of data, and deciding between descriptive (containing descriptive words and numbers) and non-descriptive (often randomly generated numbers) file names is generally determined by the size and scope of a given collection. However, filenames are not good for semantic identification, because they are non-permanent labels for a specific location on a system and can be modified without affecting the bit-level profile of a digital file.


Integrity

The cornerstone of digital preservation, "
data integrity Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle and is a critical aspect to the design, implementation, and usage of any system that stores, processes, or retrieves data. The ter ...
" refers to the assurance that the data is "complete and unaltered in all essential respects"; a program designed to maintain integrity aims to "ensure data is recorded exactly as intended, and upon later retrieval, ensure the data is the same as it was when it was originally recorded". Unintentional changes to data are to be avoided, and responsible strategies put in place to detect unintentional changes and react as appropriately determined. However, digital preservation efforts may necessitate modifications to content or metadata through responsibly-developed procedures and by well-documented policies. Organizations or individuals may choose to retain original, integrity-checked versions of content and/or modified versions with appropriate preservation metadata. Data integrity practices also apply to modified versions, as their state of capture must be maintained and resistant to unintentional modifications. The integrity of a record can be preserved through bit-level preservation, fixity checking, and capturing a full audit trail of all preservation actions performed on the record. These strategies can ensure protection against unauthorised or accidental alteration.


Fixity

File fixity File fixity is a digital preservation term referring to the property of a digital file being fixed, or unchanged. Fixity checking is the process of verifying that a digital object has not been altered or corrupted. During transfer, a repository may ...
is the property of a digital file being fixed, or unchanged. File fixity checking is the process of validating that a file has not changed or been altered from a previous state. This effort is often enabled by the creation, validation, and management of
checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data ...
s. While checksums are the primary mechanism for monitoring fixity at the individual file level, an important additional consideration for monitoring fixity is file attendance. Whereas checksums identify if a file has changed, file attendance identifies if a file in a designated collection is newly created, deleted, or moved. Tracking and reporting on file attendance is a fundamental component of digital collection management and fixity.


Characterization

Characterization of digital materials is the identification and description of what a file is and of its defining technical characteristics often captured by technical metadata, which records its technical attributes like creation or production environment.


Sustainability

Digital sustainability encompasses a range of issues and concerns that contribute to the longevity of digital information. Unlike traditional, temporary strategies, and more permanent solutions, digital sustainability implies a more active and continuous process. Digital sustainability concentrates less on the solution and technology and more on building an infrastructure and approach that is flexible with an emphasis on
interoperability Interoperability is a characteristic of a product or system to work with other products or systems. While the term was initially defined for information technology or systems engineering services to allow for information exchange, a broader defi ...
, continued maintenance and continuous development. Digital sustainability incorporates activities in the present that will facilitate access and availability in the future. The ongoing maintenance necessary to digital preservation is analogous to the successful, centuries-old, community upkeep of the
Uffington White Horse The Uffington White Horse is a prehistoric hill figure, long, formed from deep trenches filled with crushed white chalk. The figure is situated on the upper slopes of White Horse Hill in the English civil parish of Uffington (in the cer ...
(according to Stuart M. Shieber) or the
Ise Grand Shrine The , located in Ise, Mie Prefecture of Japan, is a Shinto shrine dedicated to the sun goddess Amaterasu. Officially known simply as , Ise Jingū is a shrine complex composed of many Shinto shrines centered on two main shrines, and . The Inner ...
(according to
Jeffrey Schnapp Jeffrey Schnapp is an American university professor who works as a cultural historian, designer, and technologist. Until joining the Harvard University in 2011, he was the director of the Stanford Humanities Lab from its foundation in 1999 throug ...
).


Renderability

Renderability refers to the continued ability to use and access a digital object while maintaining its inherent significant properties.


Physical media obsolescence

Physical media obsolescence can occur when access to digital content requires external dependencies that are no longer manufactured, maintained, or supported. External dependencies can refer to hardware, software, or physical carriers. For example, DLT tape was used for backups and data preservation, but is no longer used.


Format obsolescence

File format obsolescence can occur when adoption of new encoding formats supersedes use of existing formats, or when associated presentation tools are no longer readily available. While the use of file formats will vary among archival institutions given their capabilities, there is documented acceptance among the field that chosen file formats should be "open, standard, non-proprietary, and well-established" to enable long-term archival use. Factors that should enter consideration when selecting sustainable file formats include disclosure, adoption, transparency, self-documentation, external dependencies, impact of patents, and technical protection mechanisms. Other considerations for selecting sustainable file formats include "format longevity and maturity, adaptation in relevant professional communities, incorporated information standards, and long-term accessibility of any required viewing software". For example, the
Smithsonian Institution Archives Smithsonian Libraries and Archives is an institutional archives and library system comprising 21 branch libraries serving the various Smithsonian Institution museums and research centers. The Libraries and Archives serve Smithsonian Institution ...
considers uncompressed
TIFF Tag Image File Format, abbreviated TIFF or TIF, is an image file format for storing raster graphics images, popular among graphic artists, the publishing industry, and photographers. TIFF is widely supported by scanning, faxing, word process ...
s to be "a good preservation format for born-digital and digitized still images because of its maturity, wide adaptation in various communities, and thorough documentation". Formats proprietary to one software vendor are more likely to be affected by format obsolescence. Well-used standards such as
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
and JPEG are more likely to be readable in future.


Significant properties

Significant properties refer to the "essential attributes of a digital object which affect its appearance, behavior, quality and usability" and which "must be preserved over time for the digital object to remain accessible and meaningful." "Proper understanding of the significant properties of digital objects is critical to establish best practice approaches to digital preservation. It assists appraisal and selection, processes in which choices are made about which significant properties of digital objects are worth preserving; it helps the development of preservation metadata, the assessment of different preservation strategies and informs future work on developing common standards across the preservation community."


Authenticity

Whether analog or digital, archives strive to maintain records as trustworthy representations of what was originally received. Authenticity has been defined as ". . . the trustworthiness of a record as a record; i.e., the quality of a record that is what it purports to be and that is free from tampering or corruption". Authenticity should not be confused with accuracy; an inaccurate record may be acquired by an archives and have its authenticity preserved. The content and meaning of that inaccurate record will remain unchanged. A combination of policies, security procedures, and documentation can be used to ensure and provide evidence that the meaning of the records has not been altered while in the archives' custody.


Access

Digital preservation efforts are largely to enable decision-making in the future. Should an archive or library choose a particular strategy to enact, the content and associated metadata must persist to allow for actions to be taken or not taken at the discretion of the controlling party.


Preservation metadata

Preservation metadata is a key enabler for digital preservation, and includes technical information for digital objects, information about a digital object's components and its computing environment, as well as information that documents the preservation process and underlying rights basis. It allows organizations or individuals to understand the
chain of custody Chain of custody (CoC), in legal contexts, is the chronological documentation or paper trail that records the sequence of custody, control, transfer, analysis, and disposition of materials, including physical or electronic evidence. Of particula ...
. Preservation Metadata: Implementation Strategies (PREMIS), is the de facto standard that defines the implementable, core preservation metadata needed by most repositories and institutions. It includes guidelines and recommendations for its usage, and has developed shared community vocabularies.


Intellectual foundations


''Preserving Digital Information'' (1996)

The challenges of long-term preservation of digital information have been recognized by the archival community for years. In December 1994, the
Research Libraries Group The Research Libraries Group (RLG) was a U.S.-based library consortium that existed from 1974 until its merger with the OCLC library consortium in 2006. RLG developed the Eureka interlibrary search engine, the RedLightGreen database of bibliograp ...
(RLG) and Commission on Preservation and Access (CPA) formed a Task Force on Archiving of Digital Information with the main purpose of investigating what needed to be done to ensure long-term preservation and continued access to the digital records. The final report published by the Task Force (Garrett, J. and Waters, D., ed. (1996). "Preserving digital information: Report of the task force on archiving of digital information.") became a fundamental document in the field of digital preservation that helped set out key concepts, requirements, and challenges. The Task Force proposed development of a national system of digital archives that would take responsibility for long-term storage and access to digital information; introduced the concept of trusted digital repositories and defined their roles and responsibilities; identified five features of digital information integrity (content, fixity, reference, provenance, and context) that were subsequently incorporated into a definition of Preservation Description Information in the Open Archival Information System Reference Model; and defined migration as a crucial function of digital archives. The concepts and recommendations outlined in the report laid a foundation for subsequent research and digital preservation initiatives.


OAIS

To standardize digital preservation practice and provide a set of recommendations for preservation program implementation, the Reference Model for an
Open Archival Information System An Open Archival Information System (or OAIS) is an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community. The OAIS model can ...
(
OAIS An Open Archival Information System (or OAIS) is an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community. The OAIS model can ...
) was developed, and published in 2012. OAIS is concerned with all technical aspects of a digital object's life cycle: ingest, archival storage, data management, administration, access and preservation planning. The model also addresses metadata issues and recommends that five types of metadata be attached to a digital object: reference (identification) information, provenance (including preservation history), context, fixity (authenticity indicators), and representation (formatting, file structure, and what "imparts meaning to an object's bitstream").Cornell University Library. (2005
Digital Preservation Management: Implementing Short-term Strategies for Long-term Problems


Trusted Digital Repository Model

In March 2000, the
Research Libraries Group The Research Libraries Group (RLG) was a U.S.-based library consortium that existed from 1974 until its merger with the OCLC library consortium in 2006. RLG developed the Eureka interlibrary search engine, the RedLightGreen database of bibliograp ...
(RLG) and
Online Computer Library Center OCLC, Inc., doing business as OCLC, See also: is an American nonprofit cooperative organization "that provides shared technology services, original research, and community programs for its membership and the library community at large". It was ...
(OCLC) began a collaboration to establish attributes of a digital repository for research organizations, building on and incorporating the emerging international standard of the Reference Model for an Open Archival Information System (OAIS). In 2002, they published "Trusted Digital Repositories: Attributes and Responsibilities." In that document a "Trusted Digital Repository" (TDR) is defined as "one whose mission is to provide reliable, long-term access to managed digital resources to its designated community, now and in the future." The TDR must include the following seven attributes: compliance with the reference model for an Open Archival Information System (OAIS), administrative responsibility, organizational viability, financial sustainability, technological and procedural suitability, system security, procedural accountability. The Trusted Digital Repository Model outlines relationships among these attributes. The report also recommended the collaborative development of digital repository certifications, models for cooperative networks, and sharing of research and information on digital preservation with regard to intellectual property rights. In 2004 Henry M. Gladney proposed another approach to digital object preservation that called for the creation of "Trustworthy Digital Objects" (TDOs). TDOs are digital objects that can speak to their own authenticity since they incorporate a record maintaining their use and change history, which allows the future users to verify that the contents of the object are valid.


InterPARES

International Research on Permanent Authentic Records in Electronic Systems (InterPARES) is a collaborative research initiative led by the University of British Columbia that is focused on addressing issues of long-term preservation of authentic digital records. The research is being conducted by focus groups from various institutions in North America, Europe, Asia, and Australia, with an objective of developing theories and methodologies that provide the basis for strategies, standards, policies, and procedures necessary to ensure the trustworthiness, reliability, and accuracy of digital records over time. Under the direction of archival science professor
Luciana Duranti Luciana Duranti is an archival theorist and professor of archival science and diplomatics at the School of Library, Archival and Information Studies, University of British Columbia in Vancouver, Canada. She is a noted expert on diplomatics and e ...
, the project began in 1999 with the first phase, InterPARES 1, which ran to 2001 and focused on establishing requirements for authenticity of inactive records generated and maintained in large databases and document management systems created by government agencies. InterPARES 2 (2002–2007) concentrated on issues of reliability, accuracy and authenticity of records throughout their whole life cycle, and examined records produced in dynamic environments in the course of artistic, scientific and online government activities. The third five-year phase (InterPARES 3) was initiated in 2007. Its goal is to utilize theoretical and methodological knowledge generated by InterPARES and other preservation research projects for developing guidelines, action plans, and training programs on long-term preservation of authentic records for small and medium-sized archival organizations.


Challenges

Society's heritage has been presented on many different materials, including stone, vellum, bamboo, silk, and paper. Now a large quantity of information exists in digital forms, including emails, blogs, social networking websites, national elections websites, web photo albums, and sites which change their content over time. With digital media it is easier to create content and keep it up-to-date, but at the same time there are many challenges in the preservation of this content, both technical and economic. Unlike traditional analog objects such as books or photographs where the user has unmediated access to the content, a digital object always needs a software environment to render it. These environments keep evolving and changing at a rapid pace, threatening the continuity of access to the content. Physical storage media, data formats, hardware, and software all become obsolete over time, posing significant threats to the survival of the content. This process can be referred to as digital obsolescence. In the case of
born-digital The term born-digital refers to materials that originate in a digital form.NDIIPP"Preserving Digital Culture,"Library of Congress. This is in contrast to digital reformatting, through which analog materials become digital, as in the case of fil ...
content (e.g., institutional archives, websites, electronic audio and video content, born-digital photography and art, research data sets, observational data), the enormous and growing quantity of content presents significant scaling issues to digital preservation efforts. Rapidly changing technologies can hinder digital preservationists' work and techniques due to outdated and antiquated machines or technology. This has become a common problem and one that is a constant worry for a digital archivist—how to prepare for the future. Digital content can also present challenges to preservation because of its complex and dynamic nature, e.g., interactive Web pages,
virtual reality Virtual reality (VR) is a simulated experience that employs pose tracking and 3D near-eye displays to give the user an immersive feel of a virtual world. Applications of virtual reality include entertainment (particularly video games), e ...
and
gaming Gaming may refer to: Games and sports The act of playing games, as in: * Legalized gambling, playing games of chance for money, often referred to in law as "gaming" * Playing a role-playing game, in which players assume fictional roles * Playin ...
environments, learning objects, social media sites. In many cases of emergent technological advances there are substantial difficulties in maintaining the authenticity, fixity, and integrity of objects over time deriving from the fundamental issue of experience with that particular digital storage medium and while particular technologies may prove to be more robust in terms of storage capacity, there are issues in securing a framework of measures to ensure that the object remains fixed while in stewardship. For the preservation of
software Software is a set of computer programs and associated software documentation, documentation and data (computing), data. This is in contrast to Computer hardware, hardware, from which the system is built and which actually performs the work. ...
as digital content, a specific challenge is the typically non-availability of the
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the w ...
as
commercial software Commercial software, or seldom payware, is a computer software that is produced for sale or that serves commercial purposes. Commercial software can be proprietary software or free and open-source software. Background and challenge While sof ...
is normally distributed only in
compiled In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two digits (0 and 1) * Binary function, a function that takes two arguments * Binary operation, a mathematical operation that ta ...
form. Without the source code an adaption (
Porting In software engineering, porting is the process of adapting software for the purpose of achieving some form of execution in a computing environment that is different from the one that a given program (meant for such execution) was originally desi ...
) on modern computing hardware or
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also i ...
is most often impossible, therefore the original hardware and software context needs to be
emulated In computing, an emulator is Computer hardware, hardware or software that enables one computer system (called the ''host'') to behave like another computer system (called the ''guest''). An emulator typically enables the host system to run so ...
. Another potential challenge for software preservation can be the
copyright A copyright is a type of intellectual property that gives its owner the exclusive right to copy, distribute, adapt, display, and perform a creative work, usually for a limited time. The creative work may be in a literary, artistic, educatio ...
which prohibits often the bypassing of
copy protection Copy protection, also known as content protection, copy prevention and copy restriction, describes measures to enforce copyright by preventing the reproduction of software, films, music, and other media. Copy protection is most commonly found o ...
mechanisms (
Digital Millennium Copyright Act The Digital Millennium Copyright Act (DMCA) is a 1998 United States copyright law that implements two 1996 treaties of the World Intellectual Property Organization (WIPO). It criminalizes production and dissemination of technology, devices, or ...
) in case software has become an
orphaned work An orphan work is a copyright-protected work for which rightsholders are positively indeterminate or uncontactable. Sometimes the names of the originators or rightsholders are known, yet it is impossible to contact them because additional details ...
(
Abandonware Abandonware is a product, typically software, ignored by its owner and manufacturer, and for which no official support is available. Within an intellectual rights contextual background, abandonware is a software (or hardware) sub-case of the ...
). An exemption from the United States Digital Millennium Copyright Act to permit to bypass copy protection was approved in 2003 for a period of 3 years to the
Internet Archive The Internet Archive is an American digital library with the stated mission of "universal access to all knowledge". It provides free public access to collections of digitized materials, including websites, software applications/games, music, ...
who created an archive of "vintage software", as a way to preserve them. The exemption was renewed in 2006, and , has been indefinitely extended pending further rulemakings "for the purpose of preservation or archival reproduction of published digital works by a library or archive". The GitHub Archive Program has stored all of
GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continu ...
's open source code in a secure vault at Svalbard, on the frozen Norwegian island of Spitsbergen, as part of the
Arctic World Archive The Arctic World Archive (AWA) is a facility for data preservation, located in the Svalbard archipelago on the island of Spitsbergen, Norway, not far from the Svalbard Global Seed Vault. It contains data of historical and cultural interest fro ...
, with the code stored as
QR codes A QR code (an initialism for quick response code) is a type of matrix barcode (or two-dimensional barcode) invented in 1994 by the Japanese company Denso Wave. A barcode is a machine-readable optical label that can contain information about th ...
. Another challenge surrounding preservation of digital content resides in the issue of scale. The amount of digital information being created along with the "proliferation of format types" makes creating trusted digital repositories with adequate and sustainable resources a challenge. The Web is only one example of what might be considered the "data deluge". For example, the Library of Congress currently amassed 170 billion tweets between 2006 and 2010 totaling 133.2 
terabyte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...
s and each Tweet is composed of 50 fields of metadata. The economic challenges of digital preservation are also great. Preservation programs require significant up front investment to create, along with ongoing costs for data ingest, data management, data storage, and staffing. One of the key strategic challenges to such programs is the fact that, while they require significant current and ongoing funding, their benefits accrue largely to future generations.


Layers of archiving

The various levels of security may be represented as three layers: the "hot" (accessible online repositories) and "warm" (e.g.
Internet Archive The Internet Archive is an American digital library with the stated mission of "universal access to all knowledge". It provides free public access to collections of digitized materials, including websites, software applications/games, music, ...
) layers both have the weakness of being founded upon
electronics The field of electronics is a branch of physics and electrical engineering that deals with the emission, behaviour and effects of electrons using electronic devices. Electronics uses active devices to control electron flow by amplification ...
- both would be wiped out in a repeat of the powerful 19th-century
geomagnetic storm A geomagnetic storm, also known as a magnetic storm, is a temporary disturbance of the Earth's magnetosphere caused by a solar wind shock wave and/or cloud of magnetic field that interacts with the Earth's magnetic field. The disturbance that d ...
known as the "
Carrington Event The Carrington Event was the most intense geomagnetic storm in recorded history, peaking from 1 to 2 September 1859 during solar cycle 10. It created strong auroral displays that were reported globally and caused sparking and even fires in mult ...
". The Arctic World Archive, stored on specially developed film coated with silver halide with a lifespan of 500+ years, represents more secure snapshot of data, with archiving intended at five-year intervals.


Strategies

In 2006, the
Online Computer Library Center OCLC, Inc., doing business as OCLC, See also: is an American nonprofit cooperative organization "that provides shared technology services, original research, and community programs for its membership and the library community at large". It was ...
developed a four-point strategy for the long-term preservation of digital objects that consisted of: * Assessing the risks for loss of content posed by technology variables such as commonly used proprietary file formats and software applications. * Evaluating the digital content objects to determine what type and degree of
format conversion Data conversion is the conversion of computer data from one format to another. Throughout a computer environment, data is encoded in a variety of ways. For example, computer hardware is built on the basis of certain standards, which requires th ...
or other preservation actions should be applied. * Determining the appropriate metadata needed for each object type and how it is associated with the objects. * Providing access to the content. There are several additional strategies that individuals and organizations may use to actively combat the loss of digital information.


Refreshing

''Refreshing'' is the transfer of data between two types of the same storage medium so there are no bitrot changes or alteration of data. For example, transferring
census A census is the procedure of systematically acquiring, recording and calculating information about the members of a given population. This term is used mostly in connection with national population and housing censuses; other common censuses in ...
data from an old preservation CD to a new one. This strategy may need to be combined with migration when the
software Software is a set of computer programs and associated software documentation, documentation and data (computing), data. This is in contrast to Computer hardware, hardware, from which the system is built and which actually performs the work. ...
or hardware required to read the data is no longer available or is unable to understand the format of the data. Refreshing will likely always be necessary due to the deterioration of physical media.


Migration

''Migration'' is the transferring of data to newer system environments (Garrett et al., 1996). This may include conversion of resources from one file format to another (e.g., conversion of
Microsoft Word Microsoft Word is a word processor, word processing software developed by Microsoft. It was first released on October 25, 1983, under the name ''Multi-Tool Word'' for Xenix systems. Subsequent versions were later written for several other pla ...
to PDF or
OpenDocument The Open Document Format for Office Applications (ODF), also known as OpenDocument, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed wi ...
) or from one
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also i ...
to another (e.g.,
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ser ...
to
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, w ...
) so the resource remains fully accessible and functional. Two significant problems face migration as a plausible method of digital preservation in the long terms. Due to the fact that digital objects are subject to a state of near continuous change, migration may cause problems in relation to authenticity and migration has proven to be time-consuming and expensive for "large collections of heterogeneous objects, which would need constant monitoring and intervention. Migration can be a very useful strategy for preserving data stored on external storage media (e.g. CDs, USB flash drives, and 3.5" floppy disks). These types of devices are generally not recommended for long-term use, and the data can become inaccessible due to media and hardware obsolescence or degradation.


Replication

Creating duplicate copies of data on one or more systems is called ''replication''. Data that exists as a single copy in only one location is highly vulnerable to software or hardware failure, intentional or accidental alteration, and environmental catastrophes like fire, flooding, etc. Digital data is more likely to survive if it is replicated in several locations. Replicated data may introduce difficulties in refreshing, migration, versioning, and access control since the data is located in multiple places. Understanding digital preservation means comprehending how digital information is produced and reproduced. Because digital information (e.g., a file) can be exactly replicated down to the bit level, it is possible to create identical copies of data. Exact duplicates allow archives and libraries to manage, store, and provide access to identical copies of data across multiple systems and/or environments.


Emulation

''Emulation'' is the replicating of functionality of an obsolete system. According to van der Hoeven, "Emulation does not focus on the digital object, but on the hard- and software environment in which the object is rendered. It aims at (re)creating the environment in which the digital object was originally created." Examples are having the ability to replicate or imitate another operating system. Examples include emulating an
Atari 2600 The Atari 2600, initially branded as the Atari Video Computer System (Atari VCS) from its release until November 1982, is a home video game console developed and produced by Atari, Inc. Released in September 1977, it popularized microprocesso ...
on a
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ser ...
system or emulating WordPerfect 1.0 on a
Macintosh The Mac (known as Macintosh until 1999) is a family of personal computers designed and marketed by Apple Inc. Macs are known for their ease of use and minimalist designs, and are popular among students, creative professionals, and software en ...
.
Emulator In computing, an emulator is hardware or software that enables one computer system (called the ''host'') to behave like another computer system (called the ''guest''). An emulator typically enables the host system to run software or use pe ...
s may be built for applications, operating systems, or hardware platforms. Emulation has been a popular strategy for retaining the functionality of old video game systems, such as with the
MAME MAME (formerly an acronym of Multiple Arcade Machine Emulator) is a free and open-source emulator designed to recreate the hardware of arcade game systems in software on modern personal computers and other platforms. Its intention is to preserve ...
project. The feasibility of emulation as a catch-all solution has been debated in the academic community. (Granger, 2000) Raymond A. Lorie has suggested a
Universal Virtual Computer UVC-based preservation is an archival strategy for handling the preservation of digital objects. It employs the use of a Universal Virtual Computer (UVC)—a virtual machine (VM) specifically designed for archival purposes, that allows both emula ...
(UVC) could be used to run any software in the future on a yet unknown platform. The UVC strategy uses a combination of emulation and migration. The UVC strategy has not yet been widely adopted by the digital preservation community. Jeff Rothenberg, a major proponent of Emulation for digital preservation in libraries, working in partnership with Koninklijke Bibliotheek and
Nationaal Archief The Nationaal Archief (NA) is the national archives of the Netherlands, located in The Hague. It houses collections for the central government, the province of South Holland, and the former County of Holland. There is also material from private ...
of the
Netherlands ) , anthem = ( en, "William of Nassau") , image_map = , map_caption = , subdivision_type = Sovereign state , subdivision_name = Kingdom of the Netherlands , established_title = Before independence , established_date = Spanish Netherl ...
, developed a software program called Dioscuri, a modular emulator that succeeds in running MS-DOS, WordPerfect 5.1, DOS games, and more. Another example of emulation as a form of digital preservation can be seen in the example of
Emory University Emory University is a private research university in Atlanta, Georgia. Founded in 1836 as "Emory College" by the Methodist Episcopal Church and named in honor of Methodist bishop John Emory, Emory is the second-oldest private institution of ...
and the
Salman Rushdie Sir Ahmed Salman Rushdie (; born 19 June 1947) is an Indian-born British-American novelist. His work often combines magic realism with historical fiction and primarily deals with connections, disruptions, and migrations between Eastern and We ...
's papers. Rushdie donated an outdated computer to the Emory University library, which was so old that the library was unable to extract papers from the harddrive. In order to procure the papers, the library emulated the old software system and was able to take the papers off his old computer.


Encapsulation

This method maintains that preserved objects should be self-describing, virtually "linking content with all of the information required for it to be deciphered and understood". The files associated with the digital object would have details of how to interpret that object by using "logical structures called "containers" or "wrappers" to provide a relationship between all information components that could be used in future development of emulators, viewers or converters through machine readable specifications.SOLUTIONS WALKTHROUGH REPORT
, José Miguel Araújo Ferreira Department of Information Systems University of Minho 4800-058 Guimarães, Portugal
The method of encapsulation is usually applied to collections that will go unused for long periods of time.


Persistent archives concept

Developed by the
San Diego Supercomputer Center The San Diego Supercomputer Center (SDSC) is an organized research unit of the University of California, San Diego (UCSD). SDSC is located at the UCSD campus' Eleanor Roosevelt College east end, immediately north the Hopkins Parking Structure. ...
and funded by the
National Archives and Records Administration The National Archives and Records Administration (NARA) is an " independent federal agency of the United States government within the executive branch", charged with the preservation and documentation of government and historical records. It ...
, this method requires the development of comprehensive and extensive infrastructure that enables "the preservation of the organisation of collection as well as the objects that make up that collection, maintained in a platform independent form". A persistent archive includes both the data constituting the digital object and the context that the defines the provenance, authenticity, and structure of the digital entities. This allows for the replacement of hardware or software components with minimal effect on the preservation system. This method can be based on virtual data grids and resembles OAIS Information Model (specifically the Archival Information Package).


Metadata attachment

Metadata is data on a digital file that includes information on creation, access rights, restrictions, preservation history, and rights management. Metadata attached to digital files may be affected by file format obsolescence.
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
is considered to be the most durable format for metadata because it is widespread, backwards compatible when used with
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
, and utilizes human-readable characters, not numeric codes. It retains information, but not the structure information it is presented in. For higher functionality, SGML or
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
should be used. Both markup languages are stored in ASCII format, but contain tags that denote structure and format.


Preservation repository assessment and certification

A few of the major frameworks for digital preservation repository assessment and certification are described below. A more detailed list is maintained by the U.S. Center for Research Libraries.


Specific tools and methodologies


TRAC

In 2007, CRL/OCLC published Trustworthy Repositories Audit & Certification: Criteria & Checklist (
TRAC Trac is an open-source, web-based project management and bug tracking system. It has been adopted by a variety of organizations for use as a bug tracking system for both free and open-source software and proprietary projects and products. Tra ...
), a document allowing digital repositories to assess their capability to reliably store, migrate, and provide access to digital content. TRAC is based upon existing standards and best practices for trustworthy digital repositories and incorporates a set of 84 audit and certification criteria arranged in three sections: Organizational Infrastructure; Digital Object Management; and Technologies, Technical Infrastructure, and Security. TRAC "provides tools for the audit, assessment, and potential certification of digital repositories, establishes the documentation requirements required for audit, delineates a process for certification, and establishes appropriate methodologies for determining the soundness and sustainability of digital repositories".


DRAMBORA

Digital Repository Audit Method Based On Risk Assessment ( DRAMBORA), introduced by the
Digital Curation Centre The Digital Curation Centre (DCC) was established to help solve the extensive challenges of digital preservation and digital curation and to lead research, development, advice, and support services for higher education institutions in the Unite ...
(DCC) and DigitalPreservationEurope (DPE) in 2007, offers a methodology and a toolkit for digital repository risk assessment. The tool enables repositories to either conduct the assessment in-house (self-assessment) or to outsource the process. The DRAMBORA process is arranged in six stages and concentrates on the definition of mandate, characterization of asset base, identification of risks and the assessment of likelihood and potential impact of risks on the repository. The auditor is required to describe and document the repository's role, objectives, policies, activities and assets, in order to identify and assess the risks associated with these activities and assets and define appropriate measures to manage them.


European Framework for Audit and Certification of Digital Repositories

Th
European Framework for Audit and Certification of Digital Repositories
was defined in a memorandum of understanding signed in July 2010 between Consultative Committee for Space Data Systems (CCSDS), Data Seal of Approval (DSA) Board and
German Institute for Standardization German(s) may refer to: * Germany (of or related to) **Germania (historical use) * Germans, citizens of Germany, people of German ancestry, or native speakers of the German language ** For citizens of Germany, see also German nationality law **Ger ...
(DIN) "Trustworthy Archives – Certification" Working Group. The framework is intended to help organizations in obtaining appropriate certification as a trusted digital repository and establishes three increasingly demanding levels of assessment: # Basic Certification: self-assessment using 16 criteria of the Data Seal of Approval (DSA). # Extended Certification: Basic Certification and additional externally reviewed self-audit against ISO 16363 or DIN 31644 requirements. # Formal Certification: validation of the self-certification with a third-party official audit based on ISO 16363 or DIN 31644.


''nestor'' catalogue of criteria

A German initiative
''nestor''
(the Network of Expertise in Long-Term Storage of Digital Resources) sponsored by the German Ministry of Education and Research, developed a catalogue of criteria for trusted digital repositories in 2004. In 2008 the second version of the document was published. The catalogue, aiming primarily at German cultural heritage and higher education institutions, establishes guidelines for planning, implementing, and self-evaluation of trustworthy long-term digital repositories. The ''nestor'' catalogue of criteria conforms to the OAIS reference model terminology and consists of three sections covering topics related to Organizational Framework, Object Management, and Infrastructure and Security.


PLANETS Project

In 2002 the ''Preservation and Long-term Access through Networked Services'' (PLANETS) project, part of the EU
Framework Programmes for Research and Technological Development The Framework Programmes for Research and Technological Development, also called Framework Programmes or abbreviated FP1 to FP9, are funding programmes created by the European Union/European Commission to support and foster research in the Europe ...
6, addressed core digital preservation challenges. The primary goal for ''Planets'' was to build practical services and tools to help ensure long-term access to digital cultural and scientific assets. The Open Planets project ended May 31, 2010. The outputs of the project are now sustained by the follow-on organisation, the Open Planets Foundation. On October 7, 2014 the Open Planets Foundation announced that it would be renamed the Open Preservation Foundation to align with the organization's current direction.


PLATTER

Planning Tool for Trusted Electronic Repositories (PLATTER) is a tool released by DigitalPreservationEurope (DPE) to help digital repositories in identifying their self-defined goals and priorities in order to gain trust from the stakeholders. PLATTER is intended to be used as a complementary tool to DRAMBORA, NESTOR, and TRAC. It is based on ten core principles for trusted repositories and defines nine Strategic Objective Plans, covering such areas as acquisition, preservation and dissemination of content, finance, staffing, succession planning, technical infrastructure, data and metadata specifications, and disaster planning. The tool enables repositories to develop and maintain documentation required for an audit.


ISO 16363

A system for the "audit and certification of trustworthy digital repositories" was developed by the
Consultative Committee for Space Data Systems The Consultative Committee for Space Data Systems (CCSDS) was founded in 1982 for governmental and quasi-governmental space agencies to discuss and develop standards for space data and information systems. Currently composed of "eleven member agenc ...
(CCSDS) and published as
ISO ISO is the most common abbreviation for the International Organization for Standardization. ISO or Iso may also refer to: Business and finance * Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007 * Iso ...
standard 16363 on 15 February 2012. Extending the OAIS reference model, and based largely on the TRAC checklist, the standard was designed for all types of digital repositories. It provides a detailed specification of criteria against which the trustworthiness of a digital repository can be evaluated. The CCSDS Repository Audit and Certification Working Group also developed and submitted a second standard, defining operational requirements for organizations intending to provide repository auditing and certification as specified in ISO 16363. This standard was published as ISO 16919 – "requirements for bodies providing audit and certification of candidate trustworthy digital repositories" – on 1 November 2014.


Best practices

Although preservation strategies vary for different types of materials and between institutions, adhering to nationally and internationally recognized standards and practices is a crucial part of digital preservation activities. Best or recommended practices define strategies and procedures that may help organizations to implement existing standards or provide guidance in areas where no formal standards have been developed. Best practices in digital preservation continue to evolve and may encompass processes that are performed on content prior to or at the point of ingest into a digital repository as well as processes performed on preserved files post-ingest over time. Best practices may also apply to the process of digitizing analog material and may include the creation of specialized metadata (such as technical, administrative and rights metadata) in addition to standard descriptive metadata. The preservation of born-digital content may include format transformations to facilitate long-term preservation or to provide better access. No one institution can afford to develop all of the software tools needed to ensure the accessibility of digital materials over the long term. Thus the problem arises of maintaining a repository of shared tools. The
Library of Congress The Library of Congress (LOC) is the research library that officially serves the United States Congress and is the ''de facto'' national library of the United States. It is the oldest federal cultural institution in the country. The library ...
has been doing that for years, until that role was assumed by the Community Owned Digital Preservation Tool Registry.


Audio preservation

Various best practices and guidelines for digital audio preservation have been developed, including: * ''Guidelines on the Production and Preservation of Digital Audio Objects IASA-TC 04'' (2009), which sets out the international standards for optimal audio signal extraction from a variety of audio source materials, for analogue to digital conversion and for target formats for audio preservation * ''Capturing Analog Sound for Digital Preservation: Report of a Roundtable Discussion of Best Practices for Transferring Analog Discs and Tapes'' (2006), which defined procedures for reformatting sound from analog to digital and provided recommendations for best practices for digital preservation * ''Digital Audio Best Practices'' (2006) prepared by the Collaborative Digitization Program Digital Audio Working Group, which covers best practices and provides guidance both on digitizing existing analog content and on creating new digital audio resources * ''Sound Directions: Best Practices for Audio Preservation'' (2007) published by the Sound Directions Project, which describes the audio preservation workflows and recommended best practices and has been used as the basis for other projects and initiatives * Documents developed by the International Association of Sound and Audiovisual Archives (IASA), the European Broadcasting Union (EBU), the
Library of Congress The Library of Congress (LOC) is the research library that officially serves the United States Congress and is the ''de facto'' national library of the United States. It is the oldest federal cultural institution in the country. The library ...
, and the
Digital Library Federation The Digital Library Federation (DLF) is a program of the Council on Library and Information Resources (CLIR) that brings together a consortium of college and university libraries, public libraries, museums, and related institutions with the stated ...
(DLF). The Audio Engineering Society (AES) also issues a variety of standards and guidelines relating to the creation of archival audio content and technical metadata.


Moving image preservation

The term "moving images" includes analog film and video and their born-digital forms: digital video, digital motion picture materials, and digital cinema. As analog videotape and film become obsolete, digitization has become a key preservation strategy, although many archives do continue to perform photochemical preservation of film stock. "Digital preservation" has a double meaning for audiovisual collections: analog originals are preserved through digital reformatting, with the resulting digital files preserved; and born-digital content is collected, most often in proprietary formats that pose problems for future digital preservation. There is currently no broadly accepted standard target digital preservation format for analog moving images. The complexity of digital video as well as the varying needs and capabilities of an archival institution are reasons why no "one-size-fits-all" format standard for long-term preservation exists for digital video like there is for other types of digital records "(e.g., word-processing converted to PDF/A or
TIFF Tag Image File Format, abbreviated TIFF or TIF, is an image file format for storing raster graphics images, popular among graphic artists, the publishing industry, and photographers. TIFF is widely supported by scanning, faxing, word process ...
for images)". Library and archival institutions, such as the Library of Congress and
New York University New York University (NYU) is a private research university in New York City. Chartered in 1831 by the New York State Legislature, NYU was founded by a group of New Yorkers led by then- Secretary of the Treasury Albert Gallatin. In 1832, th ...
, have made significant efforts to preserve moving images; however, a national movement to preserve video has not yet materialized". The preservation of audiovisual materials "requires much more than merely putting objects in cold storage". Moving image media must be projected and played, moved and shown. Born-digital materials require a similar approach". The following resources offer information on analog to digital reformatting and preserving born-digital audiovisual content. * The Library of Congress tracks the sustainability of digital formats, including moving images. * ''The Digital Dilemma 2: Perspectives from Independent Filmmakers, Documentarians and Nonprofit Audiovisual Archives (2012)''. The section on nonprofit archives reviews common practices on digital reformatting, metadata, and storage. There are four case studies.
Federal Agencies Digitization Guidelines Initiative (FADGI)
Started in 2007, this is a collaborative effort by federal agencies to define common guidelines, methods, and practices for digitizing historical content. As part of this, two working groups are studying issues specific to two major areas, Still Image and Audio Visual. * PrestoCenter publishes general audiovisual information and advice at a European level. Its online library has research and white papers on digital preservation costs and formats. * The
Association of Moving Image Archivists The Association of Moving Image Archivists (AMIA) is a 501(c)(3) not-for-profit organization established to advance the field of moving image archiving by fostering cooperation among individuals and organizations concerned with the acquisition, ...
(AMIA) sponsors conferences, symposia, and events on all aspects of moving image preservation, including digital. Th
''AMIA Tech Review''
contains articles reflecting current thoughts and practices from the archivists' perspectives. ''Video Preservation for the Millennia (2012)'', published in the ''AMIA Tech Review'', details the various strategies and ideas behind the current state of video preservation. *The National Archives of Australia produced the Preservation Digitisation Standards which set out the technical requirements for digitisation outputs produced under the National Digitisation Plan. This includes video and audio formats, as well as non-audiovisual formats. *The
Smithsonian Institution Archives Smithsonian Libraries and Archives is an institutional archives and library system comprising 21 branch libraries serving the various Smithsonian Institution museums and research centers. The Libraries and Archives serve Smithsonian Institution ...
published guidelines regarding file formats used for the long-term preservation of electronic records, which are regarded as open, standard, non-proprietary, and well-established. The guidelines are used for video and audio formats, and other non-audiovisual materials.


Codecs and containers

Moving images require a
codec A codec is a device or computer program that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder. In electronic communications, an endec is a device that acts as both an encoder and a decoder on a signal or ...
for the decoding process; therefore, determining a codec is essential to digital preservation. In ''"A Primer on Codecs for Moving Image and Sound Archives: 10 Recommendations for Codec Selection and Management"'' written by Chris Lacinak and published by AudioVisual Preservation Solutions, Lacinak stresses the importance of archivists choosing the correct codec as this can "impact the ability to preserve the digital object". Therefore, the codec selection process is critical, "whether dealing with
born digital ''Born Digital: Understanding the First Generation of Digital Natives'' is a book by John Palfrey and Urs Gasser exploring the consequences of the wide availability of internet connectivity to the first generation of people born to it, whom Palfr ...
content, reformatting older content, or converting analog materials". Lacinak's ten recommendations for codec selection and management are the following: adoption, disclosure, transparency, external dependencies, documentation and metadata, pre-planning, maintenance, obsolescence monitoring, maintenance of the original, and avoidance of unnecessary trans-coding or re-encoding. There is a lack of consensus to date among the archival community as to what standard codec should be used for the digitization of analog video and the long-term preservation of digital video nor is there a single "right" codec for a digital object; each archival institution must "make the decision as part of an overall preservation strategy". A
digital container format A container format (informally, sometimes called a wrapper) or metafile is a file format that allows multiple data streams to be embedded into a single file, usually along with metadata for identifying and further detailing those streams. Nota ...
or wrapper is also required for moving images and must be chosen carefully just like the codec. According to an international survey conducted in 2010 of over 50 institutions involved with film and video reformatting, "the three main choices for preservation products were AVI, QuickTime (.MOV) or
MXF MXF or mxf may refer to: * Material Exchange Format, a container format for professional digital video and audio media * MXF, the IATA and FAA LID code for Maxwell Air Force Base, Alabama, United States * mxf, the ISO 639-3 code for Malgbe language ...
(Material Exchange Format)". These are just a few examples of containers. The
National Archives and Records Administration The National Archives and Records Administration (NARA) is an " independent federal agency of the United States government within the executive branch", charged with the preservation and documentation of government and historical records. It ...
(NARA) has chosen the AVI wrapper as its standard container format for several reasons including that AVI files are compatible with numerous open source tools such as VLC. Uncertainty about which formats will or will not become obsolete or become the future standard makes it difficult to commit to one codec and one container." Choosing a format should "be a trade off for which the best quality requirements and long-term sustainability are ensured."


Considerations for content creators

By considering the following steps, content creators and archivists can ensure better accessibility and preservation of moving images in the long term: * Create uncompressed video if possible. While this does create large files, their quality will be retained. Storage must be considered with this approach. * If uncompressed video is not possible, use lossless instead of lossy compression. The compressed data gets restored while lossy compression alters data and quality is lost. * Use higher bit rates (This affects resolution of the image and size of file.) * Use technical and descriptive metadata. * Use containers and codecs that are stable and widely used within the archival and digital preservation communities.


Email preservation

Email Electronic mail (email or e-mail) is a method of exchanging messages ("mail") between people using electronic devices. Email was thus conceived as the electronic ( digital) version of, or counterpart to, mail, at a time when "mail" mean ...
poses special challenges for preservation: email client software varies widely; there is no common structure for email messages; email often communicates sensitive information; individual email accounts may contain business and personal messages intermingled; and email may include attached documents in a variety of file formats. Email messages can also carry viruses or have spam content. While email transmission is standardized, there is no formal standard for the long-term preservation of email messages. Approaches to preserving email may vary according to the purpose for which it is being preserved. For businesses and government entities, email preservation may be driven by the need to meet retention and supervision requirements for regulatory compliance and to allow for legal discovery. (Additional information about email archiving approaches for business and institutional purposes may be found under the separate article, Email archiving.) For research libraries and archives, the preservation of email that is part of born-digital or hybrid archival collections has as its goal ensuring its long-term availability as part of the historical and cultural record. Several projects developing tools and methodologies for email preservation have been conducted based on various preservation strategies: normalizing email into XML format, migrating email to a new version of the software and emulating email environments
Memories Using Email
(MUSE)
Collaborative Electronic Records Project
(CERP)

(EMCAP)
PeDALS Email Extractor Software
(PeDALS)
XML Electronic Normalizing of Archives tool
(XENA). Some best practices and guidelines for email preservation can be found in the following resources: * ''Curating E-Mails: A Life-cycle Approach to the Management and Preservation of E-mail Messages'' (2006) by Maureen Pennock. * ''Technology Watch Report 11-01: Preserving Email'' (2011) by Christopher J Prom. * ''Best Practices: Email Archiving'' by Jo Maitland.


Video game preservation

In 2007 the '' Keeping Emulation Environments Portable'' (KEEP) project, part of the EU
Framework Programmes for Research and Technological Development The Framework Programmes for Research and Technological Development, also called Framework Programmes or abbreviated FP1 to FP9, are funding programmes created by the European Union/European Commission to support and foster research in the Europe ...
7, developed tools and methodologies to keep digital software objects available in their original context. Digital software objects as
video games Video games, also known as computer games, are electronic games that involves interaction with a user interface or input device such as a joystick, controller, keyboard, or motion sensing device to generate visual feedback. This feedbac ...
might get lost because of digital obsolescence and non-availability of required legacy hardware or operating system software; such software is referred to as
abandonware Abandonware is a product, typically software, ignored by its owner and manufacturer, and for which no official support is available. Within an intellectual rights contextual background, abandonware is a software (or hardware) sub-case of the ...
. Because the
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the w ...
is often not available any longer, emulation is the only preservation opportunity. KEEP provided an emulation framework to help the creation of such emulators. KEEP was developed by Vincent Joguin, first launched in February 2009 and was coordinated by Elisabeth Freyre of the
French National Library French (french: français(e), link=no) may refer to: * Something of, from, or related to France ** French language, which originated in France, and its various dialects and accents ** French people, a nation and ethnic group identified with Franc ...
. A community project,
MAME MAME (formerly an acronym of Multiple Arcade Machine Emulator) is a free and open-source emulator designed to recreate the hardware of arcade game systems in software on modern personal computers and other platforms. Its intention is to preserve ...
, aims to emulate any historic computer game, including arcade games, console games and the like, at a hardware level, for future archiving. In January 2012 the POCOS project funded by JISC organised a workshop on the preservation of gaming environments and virtual worlds.


Personal archiving

There are many things consumers and artists can do themselves to help care for their collections at home. * The Software Preservation Society is a group of computer enthusiasts that is concentrating on finding old software disks (mostly games) and taking a snapshot of the disks in a format that can be preserved for the future. * "Resource Center: Caring For Your Treasures" by American Institute for Conservation of Historic and Artistic Works details simple strategies for artists and consumers to care for and preserve their work themselves. The Library of Congress also hosts a list for the self-preserver which includes direction toward programs and guidelines from other institutions that will help the user preserve social media, email, and formatting general guidelines (such as caring for CDs). Some of the programs listed include: *
HTTrack HTTrack is a free and open-source Web crawler and offline browser, developed by Xavier Roche and licensed under the GNU General Public License Version 3. HTTrack allows users to download World Wide Web sites from the Internet to a local computer. ...
: Software tool which allows the user to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to their computer. * Muse: Muse (short for Memories Using Email) is a program that helps users revive memories, using their long-term email archives, run by Stanford University.


Scientific research

In 2020, researchers reported in a preprint that they found "176 Open Access journals that, through lack of comprehensive and open archives, vanished from the Web between 2000-2019, spanning all major research disciplines and geographic regions of the world" and that in 2019 only about a third of the 14,068
DOAJ The Directory of Open Access Journals (DOAJ) is a website that hosts a community-curated list of open access journals, maintained by Infrastructure Services for Open Access (IS4OA). It was launched in 2003 with 300 open access journals. The proje ...
-indexed journals ensured the long-term preservation of their content. Some of the scientific research output is not located at the scientific journal's website but on other sites like source-code repositories such as
GitLab GitLab Inc. is an open-core company that operates GitLab, a DevOps software package which can develop, secure, and operate software. The open source software project was created by Ukrainian developer Dmitriy Zaporozhets and Dutch developer ...
. The
Internet Archive The Internet Archive is an American digital library with the stated mission of "universal access to all knowledge". It provides free public access to collections of digitized materials, including websites, software applications/games, music, ...
archived many – but not all – of the lost academic publications and makes them available on the Web. According to an analysis by the Internet Archive "18 per cent of all open access articles since 1945, over three million, are not independently archived by us or another preservation organization, other than the publishers themselves". Sci-Hub does academic archiving outside the bounds of contemporary
copyright law A copyright is a type of intellectual property that gives its owner the exclusive right to copy, distribute, adapt, display, and perform a creative work, usually for a limited time. The creative work may be in a literary, artistic, educatio ...
and also provides access to academic works that do not have an open access license.


Digital Building Preservation

"The creation of a 3D model of a historical building needs a lot of effort." Recent advances in technology have led to developments of 3-D rendered buildings in virtual space. Traditionally the buildings in video games had to be rendered via code, and many game studios have done highly detailed renderings (see Assassin's Creed). But due to most preservationist not being highly capable teams of professional coders, Universities have begun developing methods by doing 3-D laser scanning. Such work was attempted by the
National Taiwan University of Science and Technology The National Taiwan University of Science and Technology () abbreviated as NTUST or TaiwanTech (), is a public technological university located in Taipei, Taiwan. TaiwanTech was established in 1974 as the National Taiwan Institute of Technology ...
in 2009. Their goal was "to build as-built 3D computer models of a historical building, the Don Nan-Kuan House, to fulfill the need of digital preservation." To rather great success, they were capable of scanning the Don Nan-Kuan House with bulky 10 kg (22 lbs.) cameras and with only minor touch-ups where the scanners were not detailed enough. More recently in 2018 in
Calw Calw (; previously pronounced and sometimes spelled ''Kalb'' accordingly) is a town in the middle of Baden-Württemberg in the south of Germany, capital and largest town of the district Calw. It is located in the Northern Black Forest and is a ...
, Germany, a team conducted a scanning of the historic Church of St. Peter and Paul by collecting data via laser scanning and photogrammetry. "The current church's tower is about 64 m high, and its architectonic style is neo-gothic of the late nineteenth century. This church counts with a main nave, a chorus and two lateral naves in each side with tribunes in height. The church shows a rich history, which is visible in the different elements and architectonic styles used. Two small windows between the choir and the tower are the oldest parts preserved, which date to thirteenth century. The church was reconstructed and extended during the sixteenth (expansion of the nave) and seventeenth centuries (construction of tribunes), after the destruction caused by the Thirty Years' War (1618-1648). However, the church was again burned by the French Army under General Mélac at the end of the seventeenth century. The current organ and pulpit are preserved from this time. In the late nineteenth century, the church was rebuilt and the old dome Welsch was replaced by the current neo-gothic tower. Other works from this period are the upper section of the pulpit, the choir seats and the organ case. The stained-glass windows of the choir are from the late nineteenth and early twentieth centuries, while some of the nave's windows are from middle of the twentieth century. Second World War having ended, some neo-gothic elements were replaced by pure gothic ones, such as the altar of the church, and some drawings on the walls and ceilings." With this much architectural variance it presented a challenge and a chance to combine different technologies in a large space with the goal of high-resolution. The results were rather good and are available to view online.


Education

The Digital Preservation Outreach and Education (DPOE), as part of the Library of Congress, serves to foster preservation of digital content through a collaborative network of instructors and collection management professionals working in cultural heritage institutions. Composed of Library of Congress staff, the National Trainer Network, the DPOE Steering Committee, and a community of Digital Preservation Education Advocates, as of 2013 the DPOE has 24 working trainers across the six regions of the United States. In 2010 the DPOE conducted an assessment, reaching out to archivists, librarians, and other information professionals around the country. A working group of DPOE instructors then developed a curriculum based on the assessment results and other similar digital preservation curricula designed by other training programs, such as
LYRASIS Lyrasis is a non-profit member organization serving and supporting libraries, archives, museums, and cultural heritage organizations around the world. Lyrasis is based in the United States. It was created in April 2009 from the merger of SOLINET an ...
, Educopia Institute, MetaArchive Cooperative,
University of North Carolina The University of North Carolina is the multi-campus public university system for the state of North Carolina. Overseeing the state's 16 public universities and the NC School of Science and Mathematics, it is commonly referred to as the UNC Sy ...
, DigCCurr (Digital Curation Curriculum) and
Cornell University Cornell University is a private statutory land-grant research university based in Ithaca, New York. It is a member of the Ivy League. Founded in 1865 by Ezra Cornell and Andrew Dickson White, Cornell was founded with the intention to tea ...
-ICPSR Digital Preservation Management Workshops. The resulting core principles are also modeled on the principles outlined in "A Framework of Guidance for Building Good Digital Collections" by the
National Information Standards Organization The National Information Standards Organization (NISO; ) is a United States non-profit standards organization that develops, maintains and publishes technical standards related to publishing, bibliographic and library applications. It was found ...
(NISO). In Europe,
Humboldt-Universität zu Berlin Humboldt-Universität zu Berlin (german: Humboldt-Universität zu Berlin, abbreviated HU Berlin) is a German public research university in the central borough of Mitte in Berlin. It was established by Frederick William III on the initiative o ...
and King's College London offer a joint program i
Digital Curation
that emphasizes both digital humanities and the technologies necessary for long term curation. Th
MSc in Information Management and Preservation (Digital)
offered by the
HATII The Humanities Advanced Technology and Information Institute (HATII) was a research and teaching institute at the University of Glasgow in Scotland. It was established in 1997 with Professor Seamus Ross as Founding Director until 2009. HATII led ...
at the
University of Glasgow , image = UofG Coat of Arms.png , image_size = 150px , caption = Coat of arms Flag , latin_name = Universitas Glasguensis , motto = la, Via, Veritas, Vita , ...
has been running since 2005 and is the pioneering program in the field.


Examples of initiatives

* The
Library of Congress The Library of Congress (LOC) is the research library that officially serves the United States Congress and is the ''de facto'' national library of the United States. It is the oldest federal cultural institution in the country. The library ...
founded th
National Digital Stewardship Alliance
which is now hosted by th
Digital Library Federation
* The
British Library The British Library is the national library of the United Kingdom and is one of the largest libraries in the world. It is estimated to contain between 170 and 200 million items from many countries. As a legal deposit library, the British ...
is responsible for several programmes in the area of digital preservation and is a founding member of the
Digital Preservation Coalition The Digital Preservation Coalition (DPC) is a UK-based non-profit that works with global partners to provide the necessary resources to educate various public and private entities on the best practices for long term digital preservation. Backgr ...
an
Open Preservation Foundation
Thei

is publicly available. The National Archives of the United Kingdom have also pioneered various initiatives in the field of digital preservation. * Centre of Excellence for Digital Preservation is established at
C-DAC The Centre for Development of Advanced Computing (C-DAC) is an Government of India, Indian autonomous scientific society, operating under the Ministry of Electronics and Information Technology. History CDAC was created in November 1987, init ...
, Pune, India as a flagship project under National Digital Preservation Program (NDPP) sponsored by Ministry of Electronics & Information Technology, Government of India. A number of open source products have been developed to assist with digital preservation, including Archivematica,
DSpace DSpace is an open source repository software package typically used for creating open access repositories for scholarly and/or published digital content. While DSpace shares some feature overlap with content management systems and document manag ...
,
Fedora Commons Fedora (or Flexible Extensible Digital Object Repository Architecture) is a digital asset management (DAM) content repository architecture upon which institutional repositories, digital archives, and digital library systems might be built. Fedor ...
,
OPUS ''Opus'' (pl. ''opera'') is a Latin word meaning "work". Italian equivalents are ''opera'' (singular) and ''opere'' (pl.). Opus or OPUS may refer to: Arts and entertainment Music * Opus number, (abbr. Op.) specifying order of (usually) publicatio ...
,
SobekCM SobekCM (alternately Sobek and Sobek CM) is an open-source software engine and suite of associated tools for digital libraries and digital repositories for galleries, libraries, archives, museums, colleges, universities, scholarly research projects ...
and
EPrints EPrints is a free and open-source software package for building open access repositories that are compliant with the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). It shares many of the features commonly seen in document ...
. The commercial sector also offers digital preservation software tools, such as Ex Libris Ltd.'s ''Rosetta'', Preservica's Cloud, Standard and Enterprise Editions, CONTENTdm, Digital Commons, Equella, intraLibrary, Open Repository and Vital.


Large-scale initiatives

Many research libraries and archives have begun or are about to begin large-scale digital preservation initiatives (LSDIs). The main players in LSDIs are cultural institutions, commercial companies such as Google and Microsoft, and non-profit groups including the
Open Content Alliance The Open Content Alliance (OCA) was a consortium of organizations contributing to a permanent, publicly accessible archive of digitized texts. Its creation was announced in October 2005 by Yahoo!, the Internet Archive, the University of California ...
(OCA), the
Million Book Project The Million Book Project (or the Universal Library) was a book digitization project led by Carnegie Mellon University School of Computer Science and University Libraries from 2007–2008. Working with government and research partners in India ( D ...
(MBP), and HathiTrust. The primary motivation of these groups is to expand access to scholarly resources. Approximately 30 cultural entities, including the 12-member
Committee on Institutional Cooperation The Big Ten Academic Alliance (BTAA), formerly the Committee on Institutional Cooperation (CIC), is the academic consortium of the universities in the Big Ten Conference. The consortium was renamed on June 29, 2016. Member universities The Bi ...
(CIC), have signed digitization agreements with either Google or Microsoft. Several of these cultural entities are participating in the Open Content Alliance and the Million Book Project. Some libraries are involved in only one initiative and others have diversified their digitization strategies through participation in multiple initiatives. The three main reasons for library participation in LSDIs are: access, preservation, and research and development. It is hoped that digital preservation will ensure that library materials remain accessible for future generations. Libraries have a responsibility to guarantee perpetual access for their materials and a commitment to archive their digital materials. Libraries plan to use digitized copies as backups for works in case they go out of print, deteriorate, or are lost and damaged.


Arctic World Archive

The
Arctic World Archive The Arctic World Archive (AWA) is a facility for data preservation, located in the Svalbard archipelago on the island of Spitsbergen, Norway, not far from the Svalbard Global Seed Vault. It contains data of historical and cultural interest fro ...
is a facility for data preservation of historical and cultural data from several countries, including
open source code Open-source software (OSS) is Software, computer software that is released under a Open-source license, license in which the copyright holder grants users the rights to use, study, change, and Software distribution, distribute the software an ...
.


See also

* Backup * Charles M. Dollar *
Data curation Data curation is the organization and integration of data collected from various sources. It involves annotation, publication and presentation of the data such that the value of the data is maintained over time, and the data remains available for re ...
*
Data preservation Data preservation is the act of conserving and maintaining both the safety and integrity of data. Preservation is done through formal activities that are governed by policies, regulations and strategies directed towards protecting and prolonging th ...
*
Database preservation Database preservation usually involves converting the information stored in a database to a form likely to be accessible in the long term as technology changes, without losing the initial characteristics (context, content, structure, appearance and ...
* Digital artifactual value *
Digital asset management Digital asset management (DAM) and the implementation of its use as a computer application is required in the collection of digital assets to ensure that the owner, and possibly their delegates, can perform operations on the data files. Termi ...
*
Digital curation Digital curation is the selection, preservation, maintenance, collection and archiving of digital assets. Digital curation establishes, maintains and adds value to repositories of digital data for present and future use. This is often accomplished ...
*
Digital continuity Digital continuity is the ability to maintain the digital information of a creator in such a way that the information will continue to be available, as needed, despite changes in digital technology. It focuses on making sure that information is comp ...
*
Digital dark age The digital dark age is a lack of historical information in the digital age as a direct result of outdated file formats, software, or hardware that becomes corrupt, scarce, or inaccessible as technologies evolve and data decay. Future generation ...
*
Digital library A digital library, also called an online library, an internet library, a digital repository, or a digital collection is an online database of digital objects that can include text, still images, audio, video, digital documents, or other digital ...
* Digital obsolescence *
Digital reformatting DigitizationTech Target. (2011, April). Definition: digitization. ''WhatIs.com''. Retrieved December 15, 2021, from https://whatis.techtarget.com/definition/digitization is the process of converting information into a digital (i.e. computer- ...
* Digitization * DRAMBORA * Enterprise content management * ENUMERATE * File format *
HD-Rosetta High-Density Rosetta (HD-Rosetta) is a permanent data storage device which contains engraved microscopic information on a small nickel plate. Up to 196,000 pages of information can be stored onto the plate using a focused ion beam. The image cap ...
* Information Lifecycle Management *
List of digital preservation initiatives This is a list of digital preservation initiatives aimed at the digitisation of previously existing media or preserve existing digital archives. * ABS-CBN Film Restoration Project, an initiative dedicated on the preservation of thousands of Filip ...
*
New media art preservation The conservation and restoration of new media art is the study and practice of techniques for sustaining new media art created using from materials such as digital, biological, performative, and other variable media. New media art runs a unique ris ...
*
Margaret Hedstrom Margaret L. Hedstrom, Ph.D., is the Robert M. Warner Collegiate Professor of Information at the University of Michigan School of Information. She has contributed to the field of digital preservation, archives, and electronic records management an ...
* Preservation metadata * Section 108 Study Group *
Seamus Ross Seamus Ross (born November 12, 1957) is a digital humanities and digital curation academic and researcher based in Canada. He is the son of James Francis Ross, a philosopher, and Kathleen Fallon Ross, a nurse. After graduating from the William Pen ...
*
Slow fire A slow fire is a term used in library and information science to describe paper embrittlement resulting from acid decay. The term is taken from the title of Terry Sanders's 1987 film ''Slow Fires: On the preservation of the human record.'' Solut ...
*
Trustworthy Repositories Audit & Certification Trustworthy Repositories Audit & Certification (TRAC) is a document describing the metrics of an OAIS-compliant digital repository that developed from work done by the OCLC/ RLG Programs and National Archives and Records Administration (NARA) task ...
* UVC-based preservation *
Web archiving Web archiving is the process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. Web archivists typically employ web crawlers for automated captur ...
* W.O.R.F. (Write Once Read Forever)


Footnotes


References

* * * * * * * * * * * * * Expanded version of ''Ensuring the Longevity of Digital Documents''.
Milne, Ronald -- moderator: Webcast panel discussion, "Economics,"
''Scholarship and Libraries in Transition: A Dialogue about the Impacts of Mass Digitization Projects'' (2006), Symposium sponsored by the
University of Michigan Library The University of Michigan Library is the academic library system of the University of Michigan. The university's 38 constituent and affiliated libraries together make it the List of largest libraries in the United States#Largest research libraries ...
and the
National Commission on Libraries and Information Science The National Commission on Libraries and Information Science (NCLIS) was an agency in the United States government between 1970 and 2008. The activities of the Commission were consolidated into the Institute of Museum and Library Services. Record ...
(US). *


External links


National Digital Information Infrastructure and Preservation Program
at the
Library of Congress The Library of Congress (LOC) is the research library that officially serves the United States Congress and is the ''de facto'' national library of the United States. It is the oldest federal cultural institution in the country. The library ...

DPOE - Digital Preservation Outreach & Education
at
Library of Congress The Library of Congress (LOC) is the research library that officially serves the United States Congress and is the ''de facto'' national library of the United States. It is the oldest federal cultural institution in the country. The library ...

Digital Preservation page
from the
Digital Library Federation The Digital Library Federation (DLF) is a program of the Council on Library and Information Resources (CLIR) that brings together a consortium of college and university libraries, public libraries, museums, and related institutions with the stated ...

"Thirteen Ways of Looking at...Digital Preservation"



What is Digital Preservation?
- an introduction to digital preservation by
Digital Preservation Europe The Framework Programmes for Research and Technological Development, also called Framework Programmes or abbreviated FP1 to FP9, are funding programmes created by the European Union/European Commission to support and foster research in the Europea ...

Macroscopic 10-Terabit–per–Square-Inch Arrays from Block Copolymers with Lateral Order.
Science magazine article about prospective usage of sapphire in digital storage media technology
Animations introducing digital preservation and curation

Capture Your Collections: Planning and Implementing Digitization Projects
A CHIN (Canadian Heritage Information Network) Resource
Digitales Archiv Hessen
Digital preservation page by Hessisches Hauptstaatsarchiv Wiesbaden
"Land of the lost" : a discussion of what can be preserved through digital preservation."
Nick del Pozo, Andrew Stawowczyk Long, David Pearson.
Various activities in digital preservation at the University of Cologne (professorship for Applied Computer Science in the Humanities)

Challenges in AV Digitization and Digital Preservation
* * {{DEFAULTSORT:Digital Preservation Records management